risk indicator
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
Ielanskyi, Mykyta, Schweighofer, Kajetan, Aichberger, Lukas, Hochreiter, Sepp
Hallucinations are a common issue that undermine the reliability of large language models (LLMs). Recent studies have identified a specific subset of hallucinations, known as confabulations, which arise due to predictive uncertainty of LLMs. To detect confabulations, various methods for estimating predictive uncertainty in natural language generation (NLG) have been developed. These methods are typically evaluated by correlating uncertainty estimates with the correctness of generated text, with question-answering (QA) datasets serving as the standard benchmark. However, commonly used approximate correctness functions have substantial disagreement between each other and, consequently, in the ranking of the uncertainty estimation methods. This allows one to inflate the apparent performance of uncertainty estimation methods. We propose using several alternative risk indicators for risk correlation experiments that improve robustness of empirical assessment of UE algorithms for NLG. For QA tasks, we show that marginalizing over multiple LLM-as-a-judge variants leads to reducing the evaluation biases. Furthermore, we explore structured tasks as well as out of distribution and perturbation detection tasks which provide robust and controllable risk indicators. Finally, we propose to use an Elo rating of uncertainty estimation methods to give an objective summarization over extensive evaluation settings.
Anthropic Has a Plan to Keep Its AI From Building a Nuclear Weapon. Will It Work?
Anthropic Has a Plan to Keep Its AI From Building a Nuclear Weapon. Anthropic partnered with the US government to create a filter meant to block Claude from helping someone build a nuke. Experts are divided on whether its a necessary protection--or a protection at all. At the end of August, the AI company Anthropic announced that its chatbot Claude wouldn't help anyone build a nuclear weapon. According to Anthropic, it had partnered with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to make sure Claude wouldn't spill nuclear secrets.
- Asia > North Korea (0.14)
- Pacific Ocean (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- (3 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- Energy > Power Industry > Utilities > Nuclear (0.88)
Beyond Trend Following: Deep Learning for Market Trend Prediction
Berzal, Fernando, Garcia, Alberto
Trend following and momentum investing are common strategies employed by asset managers. Even though they can be helpful in the proper situations, they are limited in the sense that they work just by looking at past, as if we were driving with our focus on the rearview mirror. In this paper, we advocate for the use of Artificial Intelligence and Machine Learning techniques to predict future market trends. These predictions, when done properly, can improve the performance of asset managers by increasing returns and reducing drawdowns.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending
Sanz-Guerrero, Mario, Arroyo, Javier
Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms. However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers. This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process. Our methodology involves processing these textual descriptions using a Large Language Model (LLM), a powerful tool capable of discerning patterns and semantics within the text. Transfer learning is applied to adapt the LLM to the specific task at hand. Our results derived from the analysis of the Lending Club dataset show that the risk score generated by BERT, a widely used LLM, significantly improves the performance of credit risk classifiers. However, the inherent opacity of LLM-based systems, coupled with uncertainties about potential biases, underscores critical considerations for regulatory frameworks and engenders trust-related concerns among end-users, opening new avenues for future research in the dynamic landscape of P2P lending and artificial intelligence.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Spain > Galicia > Madrid (0.05)
- South America > Chile (0.04)
- (7 more...)
- Information Technology > Services > e-Commerce Services (1.00)
- Banking & Finance > Loans (1.00)
Assessing Regulatory Risk in Personal Financial Advice Documents: a Pilot Study
Sherchan, Wanita, Harris, Simon, Chen, Sue Ann, Alam, Nebula, Tran, Khoi-Nguyen, Makarucha, Adam J., Butler, Christopher J.
Assessing regulatory compliance of personal financial advice is currently a complex manual process. In Australia, only 5%- 15% of advice documents are audited annually and 75% of these are found to be non-compliant(ASI 2018b). This paper describes a pilot with an Australian government regulation agency where Artificial Intelligence (AI) models based on techniques such natural language processing (NLP), machine learning and deep learning were developed to methodically characterise the regulatory risk status of personal financial advice documents. The solution provides traffic light rating of advice documents for various risk factors enabling comprehensive coverage of documents in the review and allowing rapid identification of documents that are at high risk of non-compliance with government regulations. This pilot serves as a case study of public-private partnership in developing AI systems for government and public sector.
- Law > Statutes (1.00)
- Banking & Finance > Financial Services (1.00)
- Government > Regional Government > Oceania Government > Australia Government (0.35)
Visual Macroprudential Surveillance of Banks - Sarlin - 2016 - Intelligent Systems in Accounting, Finance and Management - Wiley Online Library
We create a tool for visual surveillance of the European banking system from a macroprudential perspective. The tool performs visual dynamic clustering with the self-organizing time map (SOTM) to visualize evolving multivariate data from two viewpoints: (i) multivariate cluster structures, and (ii) univariate drivers of changes in structures. In assessing the European banking system, the main tasks the SOTM can be used for are (i) identifying structural changes and breaking points in a large number of risk indicators, and their specific location in the cross-section, and (ii) identifying the build-up of, or generally changes in, individual risk indicators in the banking system as a whole. While the former view provides indications of changes in the banking system, the latter describes the sources of these changes.